Steps in making Panoramic photo from say 5 successive photos that overlap:

$\textbf{Step 1}.$ Identify interest points in from all points. In this work, I have used SURFFeatureDetector function from MATLAB. Then, you establish matching points using the euclidean norm or other metric. Hence, we get matching points from any two successive images at this points. This is essentially done using library functions. It is important to note here that you get sufficiently large number of interest points. This set will essentially have inliers and outliers which will make finding homography hard.

$\textbf{Step 2}.$ In order to get all photos stitched as panorama, we need to project all images into one single world frame and apply a suitable mosaicing technique for removing edges (we will see this later). Hence, the important task in hand is to find the homography from all images to one world frame. Usually, the center image is considered as world and all images are transformed to this. For finding homography automatically, we follow three steps as required by this assignment. First, we apply RANSAC algorithm for finding the inliers amongst the matching interest points we found. Second, using the inliers found, we apply linear least squares method (here I have used solving homogenous system of equations using SVD approach) to refine the homography. Third and last, we use the Levenberg Marquardt Nonlinear least squares method to refine the homography further. Details of each step will follow after the general procedure.

$\textbf{Step 3}.$ Using the above AuHoCa step, we get homography between successive images. We can manipulate these homography matrices to get a homography that will map given image to central image frame. Say, we got homography matrices $H_{21},H_{32},H_{43},H_{54}$, we can find homography $H_{31}=H_{32}*H_{21}$, $H_{41}=H_{43}*H_{31}$, $H_{51}=H_{54}*H_{51}$ and finally, we can now get any homography say, transformation from any image $i$ to particular $j$, $H_{ij}=H_{i1}*H_{1j}^{-1}$

$\textbf{Step 4}.$ Now, we transform all images to centre image in the sequence using the homography calculated above. Here, we find the interesting issue where we might have more than one pixel coming over other. We can do bilinear interpolation to solve this issue. Overall steps include: from the first and last image in the sequence, apply homography for the four corners of the image and get corresponding values in central image. By this way, we can set the x and y limits of the centre image. Then, we shift the origin to make sure the points having negative coordinates are taken into account. Finally, we transform pixel by pixel and do biliner interpolation for weighted average of pixel values.

$\textbf{RANSAC algorithm}$

Input: $n_{total}$ number of matching interest points in image 1 (world) and image 2 (image)

Output: Best homography $H_{12}$ from image 1 to image 2 and Set of best inliers.

Parameters: $N$ iterations, $p$ probability of being inlier, $M$ fraction of best inliers, $n$ correspondences for calculating $H_{12}$, $\epsilon$ probability of being outlier , $\delta$ selecting criteria

Steps:

  1. Set $\epsilon=0.1$ $p=0.99$, $\delta=20$ (tunable hyperparameters)

  2. Calculate $N=\frac{ln(1-p)}{ln[1-(1-\epsilon)^n]}$, $M=(1-\epsilon)n_{total}$

  3. Randomly select $n$ matching correspondences from $n_{total}$

  4. Find homography H using usual method (maybe pseudoinverse or linear least squares if n>4)

  5. Apply H to all $n_{total}$ world points in dataset to find predicted image coordinates (remember to normalize)

  6. Find error between predicted image coordinates and actual image coordinates for all points. Here, I used 2-norm distance as measure of error.

  7. Add to inlier set if error less than $\delta$.

  8. If the percentage of inlier set is greater than $M$, terminate. Otherwise repeat from 3 until $N$ iterations

$\textbf{Linear Least Squares algorithm}$

Input: Set of matching best $k$ inliers from RANSAC algorithm

Output: Refined homography $H_{12}$

Objective: To solve cross product $X'\times HX=0$ where $X = [x, y, z]^T$ world coordinates and $X' = [x', y', z']^T$ image coordinates.

Rewriting as system of equations, we get $AH=0$ where $A \in R^{2k*9}$ and $H\in R^{9*1}$ are given by:

$\begin{bmatrix} 0 &0 &0 &-z_i'x_i &-z_i'y_i &-z_i'z_i &y_i'x_i &y_i'y_i &y_i'z_i\\ z_i'x_i &z_i'y_i &z_i'z_i &0 &0 &0 &-x_i'x_i &-x_i'y_i &-x_i'z_i \end{bmatrix}\begin{bmatrix}h_{11}\\h_{12}\\h_{13}\\h_{21}\\h_{22}\\h_{23}\\h_{31}\\h_{32}\\h_{33}\end{bmatrix} = 0$

Solving this is trivial, as we need to decompose $A = USV^T$ and take the last column vector in V as H (corresponding to least singular value). Normalize H and reshape into 3*3.

$\textbf{Levenberg Marquardt method for homography refinement}$

For this assignment, I have used MATLAB function for LM method. Hence, only the overall functioning of the algorithm is discussed here.

Objective: Given a set of N inlier points, we are supposed to find a non-linear least squares fit that will minimize the following geometric error: $\sum^N_i ||\tilde{x_i}'-f^i(p) ||$ where $f(p)$ is a function that takes in 8-element parameter vector $p$ and gives image coordinates. We call the observed image coordinates as $\tilde{x}=[x',y']^T$ and $f(p)=[f_1(p),f_2(p)]^T$ as mapped points.

Rewriting in vector notation:

Non-linear least squares objective: $$\text{min}\underset{p} \space||X-f(p)||^2$$

Let $C(p)$ be the geometric distance between the true solution $p_t$ and some solution $p$. Using Gradient descent algorithm, we can do parameter update: $$p_{k+1} = p_{k}-\gamma_k \left.\Delta C\right\vert_{p=p_k}$$ till the parameter no longer changes or within small error-norm.

Considering the cost function as minimizing squared error, $$C(p)=\epsilon(p)^T\epsilon(p)$$ where error is $$\epsilon(p)=X-f(p)$$ Now, $$\Delta C(p) = 2 J_{\epsilon}^T\epsilon(p)$$ But we know, $$J_{\epsilon} = -J_{f(p)}$$ For simplicity, let us denote, $J_{f(p)} = J_{f}$

Hence we get Gradient descent update, $$p_{k+1} = p_{k}+2\gamma_k J_{f}^T\epsilon(p)$$ Since this is unstable at closer to true solution, we bring in Gauss newton variation.

From Taylor series, we have $$X \approx f(p+\delta_p)= f(p) + J_{f(p)}\delta_p$$ where $\delta_p$ is small step in direction of minimizing the parameter.

From above, we get $$\epsilon(p)=X-f(p) =J_{f}\delta_p $$ Using pseudoinverse, $$\delta_p = (J_{f}^TJ_{f})^{-1}J_{f}^T\epsilon(p)$$ Thus, Gauss Netwon update step is given as $$ p_{k+1} = p_{k}+(J_{f}^TJ_{f})^{-1}J_{f}^T\epsilon(p)$$ Since the above equation approaches Gradient descend when it is far from true (or when $J_{f}^TJ_{f}$ is diagonal, we need better way to handle this and given by Levenberg Marquardt method. The update equation is given by $$ p{k+1} = p{k}+(J{f}^TJ_{f} + \muk I)^{-1}J{f}^T\epsilon(p)$$ where selecting the damping coefficient $\mu_k$ is critical and given by equation $$\mu{k+1} = \mu{k}\textbf{max}{\frac{1}{3}, 1-(2\rho^{LM}{k+1}-1)^3}$$ where $$\rho^{LM}{k+1} = \frac{C(p)-C(p{k+1})}{\delta{p}^TJ_{f}^T\epsilon(p)+\delta_p^T\mu_k I\delta_p}$$ Note, $$\deltap = (J{f}^TJ_{f} + \muk I)^{-1}J{f}^T\epsilon(p)$$ Since I intend to paint overall picture of LM algorithm, the last two steps are not derived in detail. Since initialization of $p$ is important, in our case, we set the result from Linear least squares as starting point. Also, $$\mu_o=\tau \text{max}{ diag(J_f^TJ_f)}$$ where $0<\tau<1$

In this assignment, I attempted to code Levenberg Marquardt from scratch but ran into issues of exploding homography values. While that is still under debugging stage, I have used lsqcurvefit and selected LM algorithm for solving the problem.

Order of displaying results:

I have considered two sets of images for creating panorama. I have shown the correspondences of all five images of set 1, inlier-outlier marked images (2 pairs) and final output for both sets. The parameters used are same as described in algorithm.

In [29]:
from IPython.display import Image

Here, we view all images side-by-side (input images -set1 and set2):

In [30]:
Image(filename='Building_Montage.jpg')
Out[30]:
In [31]:
Image(filename='PurdueTower_Montage.jpg')
Out[31]:

Correspondence between images are plot (4 pairs)

In [32]:
Image(filename='PT_12.jpg')
Out[32]:
In [33]:
Image(filename='PT_23.jpg')
Out[33]:
In [34]:
Image(filename='PT_34.jpg')
Out[34]:
In [35]:
Image(filename='PT_45.jpg')
Out[35]:

The inliers and outliers are plotted for Image 1 and Image 2, Image 2 and Image 3.

In [36]:
Image(filename='inlier_outlier_1.jpg')
Out[36]:
In [37]:
Image(filename='inlier_outlier_21.jpg')
Out[37]:
In [38]:
Image(filename='inlier_outlier_23.jpg')
Out[38]:
In [39]:
Image(filename='inlier_outlier_32.jpg')
Out[39]:

Final output of set 1 and set 2: Note, the image in set 1 has better output as the image seems to be taken far from object and also significantly higher overlap seen between successive images. In set 2, it seems the image taken is from very small distance to object and hence, the skewed output. Additional photos with more overlap could be a solution to tackle this. It was not tried at the moment due to shortage of time.

In [40]:
Image(filename='Mosaic_Building.jpg')
Out[40]:
In [41]:
Image(filename='Purdue_tower.jpg')
Out[41]: